The Advantage Learning Operator
نویسنده
چکیده
Value-based reinforcement learning typically involves the repeated application of an update rule, such as the Bellman operator TB, to an action-value function. Recent work has explored the use of alternative operators, which remain optimality-preserving and may result in improved performance. In this report, I study in particular the advantage learning operator, TALQ = TBQ − α(V − Q). A theoretical analysis of learning as an estimator of Q-value ordering shows that advantage learning may compensate for high Q-learning step sizes. I show further that advantage learning grants increased robustness to the presence of stochasticity, and discuss the importance of committing to current estimates of Q-value ordering, and of increasing action gaps. Finally, I propose two algorithms for on-line optimization of the advantage learning parameter α, demonstrating successful proofs-of-concept in simple MDPs.
منابع مشابه
The Effect of Knowledge Management Capabilities and Information Technology on Innovative Performance with Mediating Role of Entrepreneurship, Learning and Competitive Advantage
The aim of this study was to determine the effect of knowledge management capabilities and information technology capabilities on innovative performance with the mediating role of organizational entrepreneurship, organizational learning and competitive advantage. The research method is descriptive-survey. The statistical population consists of all employees of Shimifar Iran Company, and 137 peo...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملTwo-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect
This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...
متن کاملAbsorptive Capability and Competitive Advantage: Some Insights from Indian Pharmaceutical Industry
Every firm learns through firm specific methods. This learning process is operationalized by firm’s knowledge management practices. Therefore, knowledge to result in successful learning should be assisted by a combinative framework which can enhance a firms’ absorptive capability. This in turn will play a decisive role for achieving competitive advantage. Current literature in strategic managem...
متن کامل